Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Badrish Chandramouli
    @badrishc

    We might also consider integrating this functionality right into FasterKV in future. Records in FasterKV are versioned, and checkpoints/commits move the "current version" forward, e.g., from X to X+1.

    So, we might allow all upserts to happen on version X+1 (which is the current uncommitted database version). Further, we would need to add a ReadCommitted() operation that would retrieve the X (or older) version even if there is an X+1 version available. This would provide atomic visibility of all X+1 records at the instant we commit and move the current DB version forward (to X+2).

    Masashi Ito
    @mito-csod
    hi, i see there is a new interface member called GetLength<Input>(...), what is this supposed to do? i find it a bit confusing because i look at https://github.com/microsoft/FASTER/blob/9b96efc9a0add6a4b9cfed2bfae3507ea13efb6e/cs/test/VLTestTypes.cs#L77 and it doesn't seem to matter what Input is. I thought the intent might be to say "how many typeof(Input)s can you fit in a VLValue, but i couldn't quite make sense of it.
    Badrish Chandramouli
    @badrishc
    It is used when you are doing a read-modify-write (RMW) operation. Recall that the RMW call takes "Input" as a parameter. If we are updating a Value based on Input, we need to determine what the size of the new Value would be (after "adding" Input to the old Value). This is determined by GetLength<Input>(...). The default implementation of this would be to call the Input-agnostic GetLength call.
    Masashi Ito
    @mito-csod
    ok. apologies if i'm being obtuse--why is that in the impelmentation of VLValue.GetLength<Input>() I linked above, there is no dependency on Input for the resulting length? Is it because t.length already reflects the post-RMW length?
    Badrish Chandramouli
    @badrishc
    Yes, that right in this example. It is indeed a bit odd for getlength to ask for this input-specific information here at IVariableLengthStruct. Perhaps we should move it to functions, but that would be another breaking change.
    Masashi Ito
    @mito-csod
    got it, thank you!
    Masashi Ito
    @mito-csod
    i totally missed that the package name was changed. very nice updates with the support for iasyncenumerable and reorganization of functionality!
    now, this is kind of a trivial concern, and it's not even a technical one, but... FASTER is a very search-unfriendly name for a library :/ case in point, i couldn't even find the new package by searching for "microsoft faster" on nuget. it's just too ubiquitous of an adjective lol...
    Badrish Chandramouli
    @badrishc
    Yes, we have updated the NuGet sources and added lots of new features and product enhancements recently. It is time to try it out again if you have used it in the past. All docs have been updated as well. Check out this post for details: microsoft/FASTER#320
    Regarding the name, yes its a bit generic. thankfully, Google searches for "microsoft faster" are returning what we expect :).
    Masashi Ito
    @mito-csod
    ah what do you know, it does. i could swear just a few months ago, it didn't come out like this.
    i remember getting links to to sql server and stuff at the top...
    Masashi Ito
    @mito-csod
    i've started playing with fasterkv, and i'm a bit confused as to how log compaction is supposed to be used--if i have a high churn rate of some keys but not others, how do i determine what address is safe to compact to? is the only way to guarantee that i have at least one copy of every key value pair to do a snapshot first?
    Badrish Chandramouli
    @badrishc
    You can compact the log from BeginAddress to any address (say NEWBEGIN) you want on disk. FASTER will write the live records to the tail in memory. To avoid data loss in memory, you will call Compact with parameter shiftBeginAddress = false. This will ensure we do NOT shift the begin address after compaction to tail. Then, you take a checkpoint (foldover is fine). Finally, you call ShiftBeginAddress to address NEWBEGIN.
    Masashi Ito
    @mito-csod
    thank you! will read up. the compatibility with foldover was going to be my next question :D
    Ralph
    @ralphbecket
    Hi, I think I'm embarrassed to ask, but should I expect to see anything written to disc after calling TakeHybridLogCheckpoint()? I'm feeling a little lost in this API, to be honest. My mental picture is that FASTER is basically a key/value store that periodically can be made to update its state on disc (and, as expected, there are all sorts of hoops to jump through for generality/performance and all sorts of things to tweak if you absolutely have to get down to brass tacks). I'm afraid I can't work out how to make the latter part work.
    Badrish Chandramouli
    @badrishc
    No question is embarrassing here. Faster indeed can be a bit overwhelming at first with all the options. We have taken significant steps to simplify the common use cases, with the new set of samples. Several of those samples take checkpoints.
    Your mental picture of Faster KV is exactly right. When you call "await TakeHybridLogCheckpointAsync(...)" you should expect to see the data being written to disk.
    Make sure you have configured the log device with the correct path, and the checkpoint path, when creating the Faster KV instance.
    Badrish Chandramouli
    @badrishc
    Please let us know if this doesn't work for you, preferably with a simple repro so we can diagnose more. Any feedback on API improvements are appreciated as well. The samples are at https://github.com/microsoft/FASTER/tree/master/cs/samples
    Ralph
    @ralphbecket
    Thank you for replying so quickly. I think my confusion has taken a turn for the worse, unfortunately. I wasn't aware there even was a checkpoint path! Looking carefully, I can see one now. So, there must be something wrong with my mental model of FASTER. What I thought was that the "log" is the on-disc persistent state of the KV store and that a checkpoint was a manual instruction to the in-memory KV store "hey, go and update the on-disc version with all the changes that have occurred since the last checkpoint". Now, I'm not seeing any file being written anywhere (I would expect to see the log file appear somewhere) and I'm not seeing any files being written or errors or exceptions when I call TakeHybridLogCheckpoint (from my perspective it looks like a noop, but that can't be right). As I say, I haven't created my FasterKV instance with a CheckpointSettings because I wasn't aware of them.
    Ralph
    @ralphbecket
    Here's a potted version of what I'm doing:
    // Set things going...
    var device = Devices.CreateLogDevice(@"C:\Temp\Foo.log");
    var kv = new FasterKV<MyThingKey, MyThing>(
        size: 1L << 20,
        logSettings: new LogSettings { LogDevice = device },
        serializerSettings: new SerializerSettings<MyThingKey, MyThing> {
            keySerializer = () => new MyThingKeySerializer(),
            valueSerializer = () => new MyThingSerializer()
        }
    );
    // I have plain old string keys, but unfortunately I have to wrap them
    // to satisfy IFasterEqualityComparer<MyThingKey>.
    var session = KV.NewSession(new SimpleFunctions<MyThingKey, MyThing>());
    
    // Carry out some updates...
    {
        var key = new MyThingKey("abc");
        var status = session.Upsert(ref key, ref myThingAbc1, Empty.Default, 0);
    }
    {
        var key = new MyThingKey("def");
        var status = session.Upsert(ref key, ref myThingDef, Empty.Default, 0);
    }
    {
        var key = new MyThingKey("abc");
        var status = session.Upsert(ref key, ref myThingAbc2, Empty.Default, 0);
    }
    
    // Update the on-disc data.
    var cpguid = default(Guid);
    kv.TakeHybridLogCheckpoint(out cpguid);
    
    // At this point, at least, I would expect to see C:\Temp\Foo.log,
    // alas I don't.
    Badrish Chandramouli
    @badrishc
    You won't see the log on disk as long as all your data fits in main memory. You would have allocated the main memory part of the log during store creation.
    FASTER log is a single sequence of records that spans disk and main memory. As main memory fills up, pages get pushed to storage. You can force the in memory part of the log to disk by taking a checkpoint with CheckpointType.FoldOver
    Badrish Chandramouli
    @badrishc
    You don't need to wrap strings the way you do, strings are handled natively by Faster, and you don't need to specify equality comparer.
    You don't see the log because you are taking a snapshot checkpoint, that writes to a separate file. You can set the checkpoint type to fold over by providing CheckpointSettings in the constructor.
    Ralph
    @ralphbecket
    Ahhh, thank you! I'll give that a go.
    Badrish Chandramouli
    @badrishc
    You also don't need to provide serializer for string type, btw.
    Ralph
    @ralphbecket
    Good to know. I'd assumed "blittable types" meant "fixed size and contiguous in memory".
    Badrish Chandramouli
    @badrishc
    Yes, strings are not blittable. But Faster handles them specially as they are very common, with default serializer and comparer provided internally.
    Ralph
    @ralphbecket
    Some kind of progress: I have now two (empty) log files on disc (I had to provide an "object log" for some reason). Unfortunately when I try reading I now get an exception. The reader code looks like this:
    internal MyThing Read(string key)
    {
        var input = default(MyThing);
        var output = default(MyThing);
        var context = default(Empty);
        var status = Session.Read(ref key, ref input, ref output, context, 0);
        return (status == Status.OK ? output : null);
    }
    That Session.Read() call throws an exception: "Overflow in AddressInfo - consider running the program in x64 mode for larger address space support". This is the first read after a few dozen small writes, so address space exhaustion seems implausible.
    Badrish Chandramouli
    @badrishc
    As the error suggests, you need to run in 64 bit mode. 32 bit mode prevents you from storing many objects. You can choose x64 in program settings.
    And yes, an object log is needed for non blittable types. See our docs at aka.ms/FASTER for details.
    soerendd
    @soerendd
    Hey. I have a question. Is it possible or planned to also query partial keys? I ask because for C++ I used to use RocksDb but now I 'm in the need for a performant K/V for C#.
    Norbert Haberl
    @nhaberl
    Hey, would need a read heavy KV collection as cache within asp.net core web service. I am using concurrentdictionary now but this bloats memory. Have primitive type keys and some custom objects.
    So when to use Faster in comparison to concurrentdictionary? Could faster act as a better MemoryCache then ?
    Badrish Chandramouli
    @badrishc
    @nhaberl FASTER can serve this use case. It is designed to have limited memory footprint by leveraging storage for colder data. As for comparison to ConcurrentDictionary for in memory workload, see https://github.com/Microsoft/FASTER/wiki/Performance-of-FASTER-in-C%23
    @soerendd we support point operations right now. We will soon release secondary index support in the form of subset index for FASTER. See this paper for details on the subset index capability: https://badrish.net/papers/fishstore-sigmod19.pdf
    Norbert Haberl
    @nhaberl
    @badrishc thanks, I checked this ... are there some comparisons to memory consumption.
    Badrish Chandramouli
    @badrishc
    You can configure FASTER's memory as you wish, see https://microsoft.github.io/FASTER/docs/fasterkv-tuning/ for details.
    When entire data is in memory, without tuning much, FASTER takes up around 8 bytes index space plus 8 byte record header, per key value pair. You will need to configure the index size when instantiating, as well as log memory size.
    Ivan Trusov
    @renardeinside
    Hi everyone! I have a question - could you please share some docs or examples about how to use cloud storage (not local SSD) with FASTER? Unfortunately, I haven't found anything on the repo.
    Badrish Chandramouli
    @badrishc
    Have you seen our samples? We have one for cloud storage: https://github.com/microsoft/FASTER/tree/master/cs/samples/AzureBackedStore
    @renardeinside
    Dong Xie
    @dongx-psu
    @badrishc I have a version of liburing supported I/O layer for FASTER and FishStore. Do you have a testing environment to test its performance maybe?
    The PR for FishStore is already there. I will make a PR on FASTER soon as well I guess.
    Badrish Chandramouli
    @badrishc
    @dongx-psu - for FASTER, yes we can certainly test it out here on a standard workload on Linux and see if it brings any perf improvement.